An In-Depth Analysis of the Social Media Platform Trell#
Introduction#
Welcome to our data story project, where we embark on an exciting journey to explore the depths of the social media platform Trell. Through a series of interlinked visualizations and explanatory text, we aim to unravel the intricate relationships within Trell’s user data and shed light on the factors influencing user engagement.
Trell, a popular social media platform, offers users a unique space to discover, create, and share their experiences through captivating visual content. In this project, we dive into a comprehensive dataset that encompasses a wide range of attributes related to Trell’s users. From user demographics and activity patterns to engagement metrics and content preferences, our dataset provides a rich foundation for uncovering fascinating insights.
Before we delve into the analysis, we diligently preprocess the dataset to ensure data quality and relevance. Cleaning the dataset, handling missing values, and transforming variables where necessary form the crucial groundwork for our exploration. By employing best practices in data preprocessing, we ensure that our subsequent analyses and visualizations are accurate and informative.
Throughout the project, we actively seek feedback from our Teaching Assistant (TA) and peers, recognizing the value of diverse perspectives in refining our analysis and improving the clarity of our visualizations. This iterative process enables us to present a compelling data story that effectively communicates the insights derived from the Trell dataset.
Join us on this captivating journey as we uncover the correlations between various attributes within Trell and unravel the secrets behind user engagement patterns. Through the fusion of data, visualizations, and explanatory text, we hope to empower researchers, marketers, and enthusiasts with a deeper understanding of the dynamic landscape of Trell.
Our perspectives#
Perspective 1: Trell’s Perspective#
From Trell’s standpoint, understanding user behavior and preferences is crucial for improving the platform’s functionality and enhancing user satisfaction. Through this perspective, we delve into the dataset to uncover valuable insights that can inform strategic decisions and shape Trell’s future development.
We examine the correlations between attributes such as user demographics, content viewing patterns, and engagement metrics to gain a comprehensive understanding of Trell’s user base. By analyzing trends related to weekends vs. weekdays, timeslots of content consumption, and the impact of hashtags and emojis on user interactions, we aim to provide Trell with valuable insights to optimize user experience and drive platform growth.
Argument #1: The best content is (probably) created by males over 30.
Figure 3: Male over 30 create the most content of all age and gender groups. Because they create the most content, there is a big chance that the best creators of Trell are often males over 30. The watcher can conclude he can (probably) have the best quality videos when he watches content created by men over 30
Perspective 2: Content Creator on Trell#
As a content creator on Trell, you play a vital role in shaping the platform’s landscape and engaging with its user base. Through this perspective, we aim to provide insights into the factors that contribute to your success and help you optimize your content creation strategy.
By analyzing the dataset, we explore the correlation between various attributes and the content creator’s performance on Trell. We investigate factors such as following rate, average age of followers, and repetitive punctuation usage to understand their impact on content reach and engagement. Through visualizations and data-driven analysis, we aim to empower content creators with actionable insights to enhance their content’s visibility and impact.
Argument #1: The best time to upload is between 18:00-00:00.
Figure 1: This histogram shows that between 18:00-00:00 the greatest amount of videos are watched by users of Trell, for all the 6-hour-intervals there are in a day. This means that on that particular interval, the greatest amount of users is using Trell. If you want your videos to be watched as much as possible, you have to upload at the time most users are online.
Argument #2: The best age and gender group to focus on is girls under 18.
Figure 2, 4: From Figure 2, we can see that people under 18 watch by far the most videos on Trell. From Figure 4, we can see that females spent the most time on the app, because the first quartile, median, third quartile and upper fence from the female’s boxplot are higher than the male’s boxplot. If you want your videos to be watched as much as possible, you have to target girls under 18.
Perspective 3: Viewer on Trell#
As a viewer on Trell, you are an integral part of the platform’s ecosystem, consuming and engaging with the captivating content created by its users. Through this perspective, we aim to uncover insights that enhance your viewing experience and provide a deeper understanding of the content you encounter on Trell.
By analyzing the dataset, we explore the correlations between user attributes and viewing patterns, seeking to understand the factors that drive your engagement and preferences on Trell. We examine variables such as content duration, completion rates, and the impact of comments on content relevance to uncover trends that shape your viewing habits.
Argument #1: Trell needs more staff throughout the day for the best service.
Figure 1: This histogram shows that for every slot, in the next slot are more videos watched. This means that throughout the day, more users are getting active on Trell (if the time watched per video stays the same). This means that in general there are progressively more difficulties with the app throughout the day. For this reason, Trell needs more staff that can help with troubleshooting throughout the day.
Argument #2: It’s smart to target different age and gender groups with different ads of Trell.
Figure 2, 3: From these figures, we can see the difference between the people who upload the most and people who view the most. From Figure 2, we can see that especially younger people watch videos. From Figure 3, we can see that especially older people create videos, especially men. This shows that there is a big difference in age groups between watching videos and uploading videos. This shows that it is smart to show ads of the possibilities as a viewer to younger people, and to show ads of the possibilities as a content creator to older people.
Dataset and preprocessing#
Our dataset ‘train_age_dataset.csv’ can be found on: https://www.kaggle.com/datasets/adityak80/trell-social-media-usage-data?resource=download&select=train_age_dataset.csv. It can be used to find correlations between certain data about users and how many videos they watch or how long they look at a certain post on average. The only form of preprocessing we really used was Tukey’s fences. We used the standard k value of 1.5 to sort out outliers, as we wanted to filter out the bots with for example 6 million seconds watchtime per video on average.
# Imports
import pandas as pd
from scipy.stats import pearsonr
import plotly.graph_objects as go
import plotly.express as px
import plotly.offline as pyo
import numpy as np
pyo.init_notebook_mode()
# Calculate all possible Pearson's R
# Read the CSV file into a pandas DataFrame
df = pd.read_csv('train_age_dataset.csv')
list_corr = []
for column in df.columns:
for target_column in df.columns:
if column != target_column:
df_cleaned = df.dropna(subset=[column, target_column])
# Extract the two attributes as separate Series from the DataFrame
x = df_cleaned[column]
y = df_cleaned[target_column]
# Calculate Pearson's correlation coefficient and p-value
corr, p_value = pearsonr(x, y)
# Print the correlation coefficient
list_corr.append([corr, column, target_column])
#print("Pearson's correlation coefficient:", corr)
list_corr.sort()
list_corr = list_corr[::2]
print(list_corr[-10:])
[[0.7359130644246615, 'slot2_trails_watched_per_day', 'weekdays_trails_watched_per_day'], [0.7472930535908913, 'content_views', 'slot2_trails_watched_per_day'], [0.7619040569337124, 'avgComments', 'num_of_comments'], [0.7766766416363531, 'content_views', 'weekends_trails_watched_per_day'], [0.7896744553131418, 'slot3_trails_watched_per_day', 'weekdays_trails_watched_per_day'], [0.7943382967746, 'slot4_trails_watched_per_day', 'weekdays_trails_watched_per_day'], [0.7951807026891985, 'content_views', 'slot4_trails_watched_per_day'], [0.7958924268797808, 'content_views', 'slot3_trails_watched_per_day'], [0.9275480634476255, 'content_views', 'weekdays_trails_watched_per_day'], [0.9396388917332891, 'followers_avg_age', 'following_avg_age']]